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An analysis of the average properties of a local search resolution procedure for the satisfaction 
of random Boolean constraints is presented. Depending on the ratio a of constraints per variable, 
resolution takes a time T res growing linearly (T res ~ T res (a) N,a < ad) or exponentially (T res ~ 
exp(7V C( Q ))i a > a d) with the size N of the instance. The relaxation time r re3 (a) in the linear 
phase is calculated through a systematic expansion scheme based on a quantum formulation of the 
evolution operator. For a > aa, the system is trapped in some metastable state, and resolution 
. occurs from escape from this state through crossing of a large barrier. An annealed calculation of 

*m ««. • the height C( a ) °f this barrier is proposed. The polynomial/exponentiel cross-over ad is not related 

^— ^ ' to the onset of clustering among solutions. 

£ 

i. introduction 

On i 

The study of combinatorial problems^] with statistical physics techniques started almost twenty years ago@. Most 
' of the efforts have been devoted to the calculation of the optimal solution of various problems (traveling salesman, 
matching, graph or number partitioning, satisfiability of Boolean constraints, vertex cover of graphs...) as a function of 
£NJ ' the definition parameters of their inputs distributions. Central to these studies is the characterization of the properties 
of the extrema of correlated random variables, a question of considerable importance in probability theory [3(. From 
a computer science point of view, however, the main point is the characterization of resolution times. Concepts and 
tools issued from the analysis of algorithms have allowed so far to understand the behaviour and the efficiency of many 
algorithms of practical use |j, |5(, sorting for instance, but progress has been much slower in the analysis of search 
procedures for combinatorial problems. These are sophisticated algorithms hardly amenable to rigorous analysis with 
available techniques p|. Statistical physics ideas and approximation techniques may then be of great relevance to help 
develop quantitative understanding and intuition about the operation of these algorithms. To some extent, the study 
of algorithms may be seen as part of out-of-equilibrium statistical physics. 
I 1 There exists a wide variety of algorithms for combinatorial problems^]. Roughly speaking, two main classes 
may be identified. The first one includes complete algorithms, guaranteed to provide the optimal solution. They 
essentially consist in an exhaustive albeit clever (making use of branch-and-bound procedures) search through the 
configuration space, and may require very large computational times, i.e. scaling exponentially with the size of the 
inputs to be treated. Recently, notions borrowed from statistical physics as real-space renormalization and out-of- 
equilibrium growth processes allowed to reach some understanding of the operation of complete algorithms for the 
satisfiability^^] and the vertex coverjtj^j problems over random classes of inputs. Incomplete algorithms constitute 
j_j ■ another large class of resolution procedures: they may be able to find the optimal solution very quickly, but may also 
run forever without ever finding it. An example is provided by local procedures e.g. Monte Carlo dynamics which 
attempt at finding a solution from an arbitrary initial configuration through a sequence of stochastic local moves in 
the configuration space. When specialized to decision problems (for which the desidered output is the answer YES 
or NO to a question related to the inputs, as: is there a way to color a given graph with 7 colors only?), local search 
algorithms can sometimes be made one-sided error probabilistic algorithms llj. When they stop, the answer is YES 
with certainty. If they run for a time t without halting, the probability (over the non deterministic choices of local 
moves in the configuration space) that the correct answer is NO is bounded from below by a function f(t, N) going 
to 1 when t — > oo. Obviously this function depends on the size N of the input: the larger the input, the larger the 
time t c (N) it takes to reach, say, a 99% confidence that the answer is NO i.e. f(t c (N),N) = 0.99. Determining the 
scaling (polynomial or exponential) of t c (N) with N is of capital importance to assess the efficiency of the alg orithm. 

In this paper, we study this question for a local search procedure, the RandomWalkSAT algorithm^!, and a 
decision problem defined over an easy-to-parametrize class of inputs, the random satisfiability problem |l3fl. Both 
procedure and problem are defined in Section [HI We also recall the main results proven by mathematicians on these 
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issues, and present an overview of the phenomenology of Random WalkS AT. A major theoretical interest for studying 
the Random WalkSAT algorithm is that it is, apparently, of purely dynamical nature. Detailed balanced is indeed 
verified in a trivial way: the equilibrium measure is non zero over solutions only, and the transition rates from a 
solution to any other configuration are null. The equilibrium measure is therefore of no use to understand long time 
dynamics, a situation reminiscent of some models studied in out-of-cquilibrium physics e.g. the contact process for 
finite size systems^). We are thus left with a study of the dynamical evolution of a spin system with disorder in 
the interactions (the instances of the combinatorial problem to be solved are random) , a still largely open problem 
in statistical physics[l5j. We show in Section IlIII how the master equation for this evolution can be written as a 
Hamiltonian (in imaginary time) for 1/2 quantum spin systems, and use this represention to get exact results and 
develop systematic expansions for the quantities of interest, valid in some region of the parameter space. In Sect ion HVI 
we present an approximate analysis of the RandomWalkSAT procedure in the whole parameter space. We show that, 
depending on the value of the ratio a of the number of constraints per (Boolean) degree of freedom, resolution is 
either achieved in linear time or requires the escape from some metastable region in the configuration space, a slow 
process taking place over exponentially large times. Interes ting ly, the dynamics generated by RandomWalkSAT is 
very similar to the physical dynamics of (spin) glassy systems yjfl. Some perspectives are presented in Section^ Note 
that a complementary study of the RandomWalkSAT procedure was very recently carried out by Barthel, Hartmann 
and Weigt0. 

II. DEFINITIONS, KNOWN RESULTS AND PHENOMENOLOGY 
A. The random /^-Satisfiability problem 

The 3-Satisfiability (3-SAT) decision problem is defined as follows. Consider a set of N boolean variables Xi, 
i = 1, . . . , N. A literal is either a variable Xi or its negation Sj. A clause is the logical OR between 3 distinct literals. 
It is thus true as soon as one of the literals is true. A formula is the logical AND between M clauses, it is true if and 
only if all the clauses are true. A formula is said to be satisfiable if there is an assignment of the variables such that 
the formula is true, unsatisfiable otherwise. 

3-SAT is a NP-complete problem[l|; it is believed that there is no algorithm capable of solving every instance of 
3-SAT in a time bounded from above by a polynomial of the size of the instance. How well do existing and a priori 
exponential algorithm perform in practice? To answer these questions, computer scientists have devised a simple way 
of generating random instances of the 3-SAT problem, with a rich pattern of hardness of resolution 01 . Formulas are 
drawn in the following way. Repeat M times independently the same process: pick up a 3-uplet of distinct indices 
in [l,iV], uniformly on all possible 3-uplets. For each of the 3 corresponding variables, choose the variable itself or 
its negation with equal probability (1/2), and construct a clause with the chosen literals. Repetition of this process 
M times gives a set of M independently chosen clauses, whose conjunction is the generated instance. The random 
generation of formula makes the set of possible formulas a probability space, with a well defined measure. 

Numerical experiments indicate that a phase transition takes place when N, M — > oo at fixed ratio a = M/N of 
clauses per variables (in the thermodynamic limit). If a is smaller than some critical value a c ~ 4.3, a randomly 
drawn instance admits at least one solution with high probability. Beyond this threshold, instances are almost never 
satisfiable. The existence of this transition has not been proven rigorously yetl l7 | . but bounds on the threshold exist: 
the probability of satisfaction tends to 1 (respectively to 0) if a < 3.42|l8j fresp. a > 4.5060) All the above 
definitions can be extended to the K-SAT problem, where each clause is the disjunction of K, rather than 3, literals. 
2-SAT is an easy (polynomial) problem, while K-SAT is NP-Complete for any K > 3. Location of the threshold is 
rigorously known for 2-SAT (a c = 1) 20] but not for K > 3. We shall denote in the following averages on the random 
if-SAT ensemble by [•]. 

Statistical mechanics studies have pointed out the existence of another phase transition taking place in the satisfiable 
phase (a < a c ) with a location, first estimated to a s ~ 3.95 j^J and then to a s ~ 3.92 j2^. This phase transition is 
related to the microscopic structure of the set of solutions. Define d the Hamming distance between two solutions as 
the number of variables taking opposite values in these solutions. When a < a s , there exist an exponentially large 
(in N) number of solutions, each pair of which are separated by a path in the solution space, that is, a sequence of 
solutions with a O(l) Hamming distance between successive solutions along the path. The solution space is made of 
a single cluster of solutions. For a s < a < a c , the solution space breaks into an exponentially large (in N) number 
of clusters, separated by large voids prived of solutions. Two solutions in one cluster are linked through a path while 
there is no path in the solution space linking two solutions in two different clusters. This clustering phenomenon, 
whose discovery was inspired from previous works in the context of information storage in neural networks j2^, was 
subsequently found in various combinatorial problems |24l |25| . and rigorously demonstrated for the so-called random 
XORSAT problem[2(|. It is a zero temperature signature of the ergodicity breaking taking place in spin glasses [HI S3 - 
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Its precise relationship with dynamical properties, and in particular with the computational cost for finding a solution 
is not fully elucidated vetpllE?! l28|. 

B. The RandomWalkSAT algorithm 

The operation of RandomWalkSAT (also called Random Walk) on an instance of the K-SAT problem is as follows [l^. 

1. Choose randomly a configuration of the Boolean variables. 

2. If all clauses are satisfied, output "Satisfiable" . 

3. If not, choose randomly one of the unsatisfied clauses, and one of the K variables of this clause. Flip the variable. 
Notice that the selected clause is now satisfied, but the flip operation may have violated other clauses which 
were previously satisfied. 

4. Go to step 2, until a limit on the number of flips fixed beforehand has been reached. Then Output "Don't 
know" . 

What is the output of the algorithm? Either "Satisfiable" and a solution is exhibited, or "Don't know" and no 
certainty on the status of the formula is reached. Papadimitriou showed that RandomWalkSAT solves with high 
probability any satisfiable 2-SAT instance in a number of steps (flips) of the order of iV 2 [l2|. Recently Scheming was 
able to prove the following very interesting result for 3-SAT[2!j. Call 'trial' a run of RandomWalkSAT consisting of 
the random choice of an intial configuration followed by 3 x N steps of the procedure. If none of T successive trials 
of RandomWalkSAT on a given instance has been successfull (has provided a solution), then the probability that this 
instance is satisfiable is lower than exp(— T x (3/4)^). In other words, after T ^ (4/3)^ trials of RandomWalkSAT, 
most of the configuration space has been 'probed': if there were a solution, it would have been found. Though 
RandomWalkSAT is not a complete algorithm, the uncertainty on its output can be made as small as possible and it 
can be used to prove unsatisfiability (in a probabilistic sense). 

Schoning's bound is true for any instance. Restriction to special input distributions allows to improve his result. 
Alekhnovich and Ben-Sasson showed that instances drawn from the random 3-Satisfiability ensemble described above 
are solved in polynomial time with high probability when a is smaller than 1.63 30] . It is remarkable that, despite 
the quenched character of the disorder in this problem (the same clauses are seen various times in the course of the 
search), rigorous results on the dynamics of this spin model can be achieved. 

C. Phenomenology of the operation of RandomWalkSAT 

In this section, we briefly sketch the behaviour of RandomWalkSAT, as seen from numerical experiments |3l| and 
the analysis presented later in this article. We find that there is a dynamical threshold ay separating two regimes: 

• for a < ad — 2.7 for 3-SAT, the algorithm finds a solution very quickly, namely with a number of flips growing 
linearly with the number of variables N. Figure shows the plot of the fraction </?o of unsatisfied clauses as 
a function of time t (number of flips divided by M) for one instance with ratio a — 2 and N — 500 variables. 
The curves show a fast decrease from the initial value ((po(t = 0) = 1/8 in the large N limit independently 
of a) down to zero on a time scale t res — 0(1). Fluctuations are smaller and smaller as N grows. t res is an 
increasing function of a. This relaxation regime corresponds to the one studied by Alekhnovich and Ben-Sasson, 
and ad > 1-63 as expected |30j|. Figure|2J\ symbolizes the behaviour of the system in the relaxation regime. 

• for instances in the ad < a < a c range, the initial relaxation phase taking place on t = 0{1) time scale is 
not sufficient to reach a solution (Figure QJ3). The fraction ipo of unsat clauses then fluctuates around some 
plateau value for a very long time. On the plateau, the system is trapped in a metastable state. The life time 
of this metastable state (trapping time) is so huge that it is possible to define a (quasi) equilibrium probability 
distribution pn(<Po) for the fraction tpa of unsat clauses. (Inset of Figure^3). The distribution of fractions is well 
peaked around some average value (height of the plateau), with left and right tails decreasing exponentially fast 
with N, pn((Pq) ~ exp(N(((po)) with ( < (Figure 03). Eventually a large negative fluctuation will bring the 
system to a solution (i^o = 0). Assuming that these fluctuations are independant random events occuring with 
probability pn(0) on an interval of time of order 1, the resolution time is a stochastic variable with exponential 
distribution. Its average is, to leading exponential order, the inverse of the probability of resolution on the 0(1) 
time scale: [t res ] ~ exp(7V£) with ( — — C(0). Escape from the metastable state therefore occurs through barrier 
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FIG. 1: Fraction tpo of unsatisfied clauses as a function of time t (number of flips over M) for two randomly drawn instances 
of 3-SAT with ratios a = 2 (A) and a — 3 (B) with N = 500 variables. Note the difference of time scales between the two 
figures. Insets of figure B: left: blow up of the initial relaxation of ipo, taking place on the O(l) time scale as in (A); right: 
histogram P5oo(^o) of the fluctuations of <po on the plateau 1 < t < 130. 



crossing and takes place on exponentially large-in-JV time scales, as confirmed by numerical simulations for 
different sizes. Schdning's result 29] can be interpreted as a lower bound to the probability ((0) > ln(3/4), true 
for any instance. 

The plateau energy, that is, the fraction of unsatisfied clauses reached by RandomWalkSAT on the linear time 
scale is plotted on Figure |3J Notice that the "dynamic" critical value ad above which the plateau energy is positive 
(RandomWalkSAT stops finding a solution in linear time) is strictly smaller than the "static" ratio a c , where formulas 
go from satisfiable with high probability to unsatisfiable with high probability, fn the intermediate range < 
a < a c , instances are almost surely satisfiable but RandomWalkSAT needs an exponentially large time to prove 
so. Interestingly, ad and a c coincides for 2-SAT in agreement with Papadimitriou's result |l2|. Furthermore, the 
dynamical transition is apparently not related to the onset of clustering taking place at a s . 



III. EXACT RESULTS: SPECIAL CASES AND EXPANSIONS 



A. Evolution equations and quantum formalism 



Boolean variables will be hereafter represented by Ising spins, Si — 1 (resp. —1) when the Boolean variable Xi is true 
(resp. false). A microscopic configuration S is specified by the states of all variables: S = (Si, S2, ■ ■ ■ , Sn)- We then 
define a 2 N dimensional linear space with canonical basis {|S)}, orthonormal for the scalar product (S'|S) = Yii ^S', Si- 
Let us denote Prob[S, T) the probability that the system configuration is S at time T i.e. after T steps of the algorithm, 
and define [^3 

|S(T)) = £Prob[S,T]|S) (1) 
s 

as the (time-dependent) vectorial state of the system. Knowledge of this vector gives access to the probability 
Prob[S,T] = (S|S(T)) of being in a certain configuration S. 

RandomWalkSAT defines a Markov process on the set of configurations; during one step of the algorithm the state 
vector of the system changes according to 

\8(T + l)) = W d \8(T)) (2) 
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FIG. 2: Schematic picture of the operation of Random WalkSAT for ratios a smaller (A) or larger (B) than the dynamical 
threshold q^. Vertical axis is minus the logarithm of the probability (divided by N) that the system has a fraction tpo of 
unsat clauses after a large number of Random WalkSAT flips. Its representative curve can be seen as an energy potential in 
which the configuration (represented by an empty ball) rolls down towards the more probable value of the order parameter ipo, 
or up, through stochastic fluctuations. The starting configuration violates tpo = 1/8 of the clauses (point I). At small ratios 
(A), the configuration rolls down to reach point S through a sequence of intermediary points (F). Resolution is essentially a 
fast relaxation towards S (O(l) time scale). At large ratios (B), the ball first relaxes to the bottom of the well (point P with 
abscissa corresponding to the plateau height). Then slow negative (ball A) or positive (ball B) fluctuations of the fraction </?o 
take place on exponentially (in N) long time scales. The time t res ~ exp(7V £) it takes to the system to reach a solution (point 
S) is, on the average, equal to the inverse probability exp(iV £) that a fluctuation drives the system to S. 
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FIG. 3: Fraction tpo of unsatisfied clauses on the metastable plateau as a function of the ratio a of clauses per variable. 
Diamonds are the output of numerical experiment, and have been obtained through average of data from simulations at a given 
size N (nb. of variables) over 1,000 samples of 3-SAT, and extrapolation to infinite sizes (dotted line serves as a guide to the 
eye). The ratio at which ipo begins being positive, ad — 2.7, is smaller than the thresholds a s — 3.9 and a c — 4.3 above which 
solutions gather into distinct clusters and instances have almost surely no solution respectively. The full line represents the 
prediction of the Markovian approximations of Section ft V CI and II V El 



where the evolution operator in discrete time W4 reads 

M N I N \ M 

In the above expression, we have made use of several new notations that we now explain. The M X N matrix 
Cu encodes the instance : Cn equals 1 if the £-th clause contains the literal Xi, —1 if it contains the literal 2^, 
and otherwise. Since every clause contains K literals, J2i = K ^ The Pauli operators are defined through 
erf I S) = Si|S), and erf |S) = |S J ), where S l is the configuration obtained from S by flipping the i-th spin. It is a simple 
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check that the argument of function gx in Ui is a diagonal operator in the canonical basis {|S)}, with eigenvalues 
x = J2i=i CuSi in {— K, —K + 2, . . . , K — 2, K}, equal to —K if and only if clause £ is unsatisfied. The function 

1 K ~ 1 

g K (x)=S x ^ K = ^ ]r ^'[[(K~2p-x) (4) 

is a polynomial of degree K in x equal to 1 if a; = ~K and to zero for all the other possible eigenvalues. Operator E 
is diagonal in the canonical basis too, with eigenvalues equal to the numbers of unsatisfied clauses (also called energy) 
in each configuration. Therefore Ui ■ E^ 1 in acts as a filter retaining clause £ only if it unsatisfied, and with a 
weight equal to the inverse number of unsatisfied clauses: only unsat clauses can be chosen at each time step, each of 
them with the same probability. flips the spins of clause £, each with probability 1/K. Notice that Wd conserve 
probabilities: (0|Wd = (0\ where (0\ — J2s(^\ 1S * ne superposition of all possible states. 

In the thermodynamic limit, the evolution can be rewritten in continuous time, defining t — T/M, 

^|S(t)) = W\S(t)) , W = M(W d -l) (5) 

Formally, the solution of this equation is |S(t)) = e tw/ |S(0)), where |S(0)) = (l/2 iV )5^ s |S) since the initial configu- 
ration is random. An important quantity to compute is the average fraction of unsatisfied clauses at time t, i.e. after 
T = Mt steps of the algorithm, averaged both on the history of the algorithm and on the distribution of formulas: 

w(t) = -fi[(0\Ee t *\8(p))] (6) 

For the sake of analytical simplicity, we shall study a slightly different evolution operator in the next two subsections, 
and use the parameter it to denote the time parameter of this modified evolution, 

1 M rl 

W' = W--E = Y,{h-i)-U t , J-|S(u)) = W'|S(u)) (7) 

i=i u 

Let us explain the meaning of this modification. The operator W would have been obtained starting from the 
following variant of the RandomWalkSAT stochastic process. At each step, choose a clause among the M ones. If it 
is satisfied, do nothing. If it is unsatisfied, flip one of its variables. 

In the thermodynamic limit, the fraction of unsatisfied clauses ifo is expected to become a self averaging quantity, 
i.e. to be peaked with high probability around its (t-dependent) mean value. Turning W into W' thus amounts to a 
local redefinition of time, independent of the instance of the problem. In definition J7J, the operator E/M can be 
replaced with its mean value ipo, leading to -4- = fo^i- ^ ne knowledge of the ^'-evolution of any observable in 
terms of u can then be rewritten in terms of t through 



t{u) = I du'ip (u') . (8) 
Jo 

Examples of this time reparametrization will be shown below. 



B. The K = 1 case 



Let us first study the simple case K = 1. A clause is then a single literal, i.e. either a variable or its negation. 
If both a variable and its negation appear in the formula, it is obviously unsatisfiable. This is the case, with high 
probability in the thermodynamic limit, as soon as a > 0. Static properties of 1-SAT model are known exactly |33| . 
and its dynamics under the RandomWalkSAT evolution can be solved too. 

An instance is described by a set of integers {rrii,ni}, where m% is the number of clauses in which the variable i 
appears, and (rrii — Uj)/2 is the number of times it has been chosen without negation. The evolution operator in terms 
of it-time is the sum of site-operators 



= - {m.af - to; + njof - r^of of ) 



(9) 
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FIG. 4: Comparison between numerics and analytical results for K = 1. Left panel: fraction of unsat clauses ipo(t) as a function 
of time t for a = 3 (top) and a = 1 (bottom). Dashed line: analytical curve, solid line (almost superimposed): numerical 
results (single run with N = 10 4 ). Right panel: asymptotic fraction of unsat clauses as a function of a, solid line: theoretical 
prediction (1131 1. symbols: numerical results (averaged over 10 runs, over t £ [8, 10], for N = 10 4 ), dashed line: groundstate 
energy tp GS . 



As the operators W[ on different sites commute, the evolution operator can be diagonalized in each of the i subsets 
independently, and the vector state is a tensor product, 

5US S =±i(l + ^(l-e- mi "» ifm^O (iUj 

where |±)^ are the eigenvectors of of with eigenvalues ±1. The fraction of unsatisfied clauses for a given instance and 
averaged over the choices of the algorithm reads, 

In the thermodynamic limit, the mj's become independent random variables with identical Poisson distributions of 
parameter a; (m^ — n^)/2 obeys a Binomial law of parameter 1/2 among to,. Performing the quenched average over 
the formula, the fraction of unsatisfied clauses reads 

1 l-exp(q(e-"-l)) 
= 2 Ta (12) 

These exact results are compared with numerical simulations in Figure On the right panel has been drawn the 
asymptotic fraction of unsatisfied clauses obtained at large times, compared with the analytical prediction made 
above. The left panel shows the time evolution of the fraction of unsatisfied clauses as a function of time t, for 
two different values of a. The analytical curve has been obtained through the rescaling explained at the end of the 
previous subsection: from the exact value of tpo(u) given above, the original time t(u) has been obtained by numerical 
integration (eqn. ©), and the plot ipo(t) is a parametric plot {t — t(u), ipo = (po(u)}, parametrized by u. 
Note that the asymptotic value of the energy reached is larger than the groundstate one |3^| , 

lim p„(t) = ~ + - >\[\- e~ a I (a) - e~ a h{a)} = <p GS (13) 

where /„ is the n-th modified Bessel function. This is easily understood: if a formula contains x\ once and x\ twice, 
the optimal value is x\ false. But the algorithm does not stop in this configuration, as the clause x\ is then violated. 
RandomWalkSAT will keep on flipping the variable and make the average energy higher than the optimal one. 



C. The satisfiable phase of the K = 2 case 

We now turn to the 2-SAT case, where every clause involves two variables. It has been rigorously proven that there 
is a sharp threshold phenomenon taking place at ct c = 1 in this problem |20j : for lower values of a the formulas are 
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almost always satisfiable, beyond this threshold they are almost always unsatisfiable. The dynamical critical threshold 
of Random WalkSAT has the same value, a<j = 1 12]. Thus there is no metastability in 2-SAT: if a solution of the 
formula exists it is almost surely found in polynomial time. Two questions of interest are: how fast will the number 
of unsatisfiable clauses decrease during the evolution of the process and how long will it take the algorithm to find a 
solution for a < 1? What will be the typical energy after a long run of the algorithm when a > 11 In this and the 
next subsections we shall address the first of these two questions, and leave the second one to the next section. 

As in the K = 1 case, one can use the quantum formalism and write down the evolution operator W . There appear 
both single site terms and couplings between pairs of sites, 



with 



w 



E^+E^ 



W- — - (rmaf -m.i + 2n l of - rucrfaf) 



1 I a i a 3 - h, ,rr, CTj 



(14) 

(15) 
(16) 



Cij 



r 2 



a. 



E C H C U 



(17) 



Two-sites operators W-j do not commute when they share a variable, and W' cannot be factorized as in the K = 1 
case. Yet, a cluster expansion in powers of a can be implemented. A detailed presentation of this cluster expansion 
in a closely related context has been given elsewhere |34|: the reader is referred to this previous work for more details. 
The method is presented below for a generic value of K. We call cluster and denote F r a maximal set of variables 
connected by clauses. Any formula F can be decomposed as a conjunction of these clusters (note that despite the 
similarities in denominations, these clusters have nothing to do with the clustering phenomenon in configuration 
space). Consider now a quantity Q, depending of the realization of the disorder, namely of formula F, and additive 
with respect to the clusters decomposition, Q(F) = J^f Q(Pr)- Calculation of the average [Q] over the random 
ensemble formulas of Q can be done as follows. First, consider the different possible topologies s of the clusters, 
compute [Q] s , the quantity averaged over the choices of signs for a given topology i.e. whether variables appear 
negated or not in a clause. Secondly, add up these contributions with combinatorial factors giving the frequency of 
appearance of the different topologies, 



M 



E — 



-P, 



(18) 



The sum runs over the different topologies, L s is the number of sites in such a cluster, and P s is the probability of 
a given site belonging to an s-type cluster. In the thermodynamic limit, the finite-size clusters which contribute to 



this sum are tree-like: P s — (aK\) T 



-L.aK 



K s , where m s — (L s — 1)/{K— 1) is the number of clauses of the cluster, 



and K s a symmetry factor. The smallest tree- like clusters are shown in Figure Clauses are represented by dashed 
lines (for K — 2) or stars (for K > 3). In principle, such an expansion is meaningful only below the percolation 
threshold of the underlying random hypergraph, a p = 1/(K(K — 1)). Beyond oe p , a giant component appear, whereas 
the series (|18J) takes into account clusters of size O(lniV) only. However, rearranging the series as an expansion in 
powers of a (expanding out the exponential in P s ) allows to extend its domain of validity beyond a p , at least for 
quantities that are not singular at the percolation threshold. 

This method can be employed here: the evolution operator W' is a sum of operators for each cluster W'(F r ). 
Operators attached to two distinct clusters commute with each other as they do not have common variables. The 
energy operator E can also be written as a sum over the clusters, thus the energy averaged over the history of the 
algorithm has the property of additivity over clusters if the time evolution is studied in terms of u. After averaging 
over the formulas, the fraction of unsatisfied clauses reads, 



(19) 



The evolution and energy operators for a cluster with n variables are 2 ra x 2 n matrices, W' s and E s respectively. With 
the help of a symbolic computation software one can easily study the small finite-size clusters by computing 



[(E)(u)] s = (O\E s e w «»\S(0)) 



(20) 
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Type L s K„ 



a 2 


1 


b 3 


3/2 


c 4 


2 


d 4 


2/3 



[{E)(u)] s 

4 C 



J_ e -2« , _5_ - 



+ ie 



-u/2 



e~ 3u + 10e~ 2u + 4e~ 3u/2 + 53e~ u + 20e~ u/2 + 2(2 - ^/2)e-^ lu + 2(2 + V2)e~ : 
e~ 3u + 10e-' 2u + 25e- u + 32e~ u/2 - 2(1 + V3)e~^ /Iu - 2(1 - v^e -2 ^ 1 



\tres\s 

1/4 
19/32 

125/128 
259/256 



TABLE I: Contributions to the cluster expansions for K = 2. See left panel of Figure 
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FIG. 5: Tree-like clusters contributing to the expansions. Left panel: K = 2. Right panel: K = 3. 

where (0\ and |S(0)) are now 2™ row and column vectors, and the average is over the choices of negating or not the 
variables in the clauses. 

We have performed this task for clusters with up to four sites, their contributions are summarized in Table [I] Up 
to order a 2 , the expansion leads to 



-a 2 | -r 



3 -u/2 i — P - U _|_ I p -3«/2 _ 9-2" i Ap-3" 1 

+ 64 6 + 8 32 6 + 64 e ^ 



The typical value ipo(t) of the fraction of unsatisfied clauses after T = tM steps of the algorithm is obtained from 
(|21|l through the rescaling of time defined in eqn. Our theory is compared in Figure El to numerical simulations. 
The agreement is excellent at the beginning of the time evolution, and worsens at the end. A factor of explanation is 
that during the last steps of the algorithm, the fraction of unsatisfied clauses is low and thus for finite-size samples, 
the self averaging hypothesis is violated. 

Our approach is applicable to any value of K (note however that the size of the matrices to be diagonalized grows 
faster with the number of clauses in the clusters studied, that is, with the order in a in the expansion) but is restricted 
to the a < ad regime. The finite-size tree clusters considered at any (finite) order in the expansion are indeed solved 
in linear time by Random WalkS AT. In Section IIVI another kind of approximation will allow to study the a > ad 
regime . 



D. Cluster analysis of resolution times 

As the number of steps needed to solve a formula is an additive quantity over clusters, it can be calculated along 
the lines exposed above for the expansion of ipg. Let us give two examples about how to compute the average time 
needed to solve a cluster of fixed topology. 
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FIG. 6: Fraction of unsat clauses as a function of time for K = 2, a — 0.6 (top) and a = 0.3 (bottom). Dashed lines: cluster 
expansion prediction, solid lines (almost superimposed): numerical results (averaged over 10 runs for N = 10,000). 



The simplest example is a cluster made of a single 2-SAT clause, (x\ Va^). With probability 3/4, the initial random 
condition is already a solution. If not, it will take one time step to solve the cluster, as any of the two possible spin 
flips will lead to a solution. Thus the average resolution time is 1/4. Similar analysis can be done on bigger clusters, 
even if the counting becomes increasingly tiresome. As a byproduct, some flaws of the RandomWalkSAT heuristic 
appear. 

The second example is the cluster (xi V12) A (X2 V 2:3). The eight configurations of the variables are represented 
in Figure with up or down spins corresponding to true or false boolean variables. The four solutions are drawn 
in the shaded box. Two configurations are turned into solutions in one flip (single outcoming arrow). Flip of a 
spin from each of the remaining two configurations (with two departing arrows) , leads to a solution, or to the other 
configuration. Thus the average resolution time starting from one of these configurations is two timcstcps. Finally, 
the average resolution time for the cluster is 3/4. To obtain the average over the signs of this resolution time, one 
still has to make the same study for the case where the two clauses are not contradictory on the central spin, hence 
the value 19/32 on line b of Table [Q 

Obviously, RandomWalkSAT can make "bad" choices on this very simple example, and "oscillate" a few time steps 
between the two configurations before finding a solution. One can imagine many local search heuristics which would 
do better than RandomWalkSAT on the example shown before. For instance, one could modify the algorithm so that 
once an unsatisfied clause has been chosen randomly, it flips preferably a variable with a low number of neighboors, 
or one with the lowest number of contradicting clauses on it. The average resolution time of any of these heuristics, 
as long as the information used to choose the variable to be flipped remains local, can be studied by such a cluster 
expansion. These simple enumerations could then provide an useful test ground for new heuristics. 

For general K , the enumeration of the possible histories of clusters with three or less clauses leads to the quantities 
given in Table ITT1 

M , „. 1 K(K+l) 1 AK & + K 5 + 6K 3 - 10K 2 + 2K 1 , 3 . 

This prediction is in good agreement with numerical simulations, see Figure |H1 for results in the K = 3 case. 

The validity of expression 112211 can be easily checked for K = 2 from the findings of Section UlI CI Indeed, in terms 
of the time t = T/M, the fraction of unsat clauses tpo(t) vanishes after a finite time t res given by 

1*00 

t res = lim tin) = I du'ipo(u') (23) 

u^oo J Q 

Integration of (|21|l coincides with prediction l|22[) for t res (a,2). 



IV. APPROXIMATE ANALYSIS OF RANDOMWALKSAT 



In this section, an analysis of RandomWalkSAT, based on a Markovian approximation for the evolution equations 
@ is proposed. It allows a quantitative description of the main features of RandomWalkSAT, namely the asymptotic 
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FIG. 7: An example of the behaviour of RandomWalkSAT on a finite cluster (xi V 22) A {x% V 23). Blocks of spins represent 
configurations of the boolean variables, those in the shaded box are solutions. Arrows stand for possible transitions of the 
algorithm, see the text for details. 




FIG. 8: Average resolution time t re3 (a, 3) for 3-SAT. Symbols: numerical simulations, averaged over 1, 000 runs for N = 10, 000. 
Solid line: prediction from the cluster expansion 1221 . 



energy and the (exponentially small) probability of resolution in linear time for generic values of K and ev(> aj). 
approximation scheme is also applied to the analysis of RandomWalkSAT on the XORSAT problem[35|. 



The 



A. Projected evolution 

Consider an instance of the if- SAT problem. RandomWalkSAT defines a Markov process on the space of configura- 
tions of the Boolean variables, see eqn. J5J, of cardinality 2 . Call S(T) the configuration of the Boolean variables at a 
given instant T (number of flips) of the evolution of the algorithm. An observable 1Z is a function of the configuration 
e.g. the number of clauses violated by S. The principle of the approach developed now is to track the evolution of 
1Z(T) = 1Z(S(T)), that is, of one number (or a low-dimensional vector) instead of the whole configuration of spins. 
To do so, we make use of the projection operator formalism exposed below. 

Let us partition the configuration space into equivalence classes of microscopic configurations S associated to the 



Type 



K s {K\) m >/L s 



a 1 


K 




1 


b' 


2K - 


1 


K 2 
2 


1 

c 


3K- 


2 


K 3 (K-1) 
2 


d' 


3K- 


2 


K 3 

e 



3 • T 



2^ 

1 TnA'+l , K+l 
|_ + K(K-V)\ 



I C\K-\-\ K -f- 1 j l. (A :.'.(V , 't\ ■ 

Z K(K-l) :jK 2 (K-l)(2K-l)(K'< ! 



3 ■ r- 



-.K 3(A'+1) 
J K(K-l) 



2K+1 

K'-'(K-l) 



TABLE II: Contributions to the cluster expansion for the resolution time, general K. See right panel of Figure 
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same value of the macroscopic observable TZ(S). We call il(R) = {S|7?.(S) = R} these classes, and |^(-R)| their 
cardinalities (number of configurations in these classes). Let us define the projection operator V through its entries, 

(Sii ^ |s2) -^W WSi) ^ (S2)) ' (24) 

where S denotes the (vectorial) Kronecker function. One can easily check that it is indeed a projector, V 2 = V , which 
connects only configurations within the same class. 

Now consider the state vector |S(T)) Q and its projection |P(T)) = V\S(T)). Its components have the same 
value in each class, which is the average of |S(T)) over the microscopic configurations in the class. Call |Q(T)) = 
(1 - P)\S(T)) = |S(T)) - |P(T)>. From the master equation ©, we obtain 

|P(T+1)) = V-W d \P(T))+V-W d \Q(T)) 

|Q(T + 1)) = (l-V)'W d \P(T)) + (l-V)'W d \Q(T)) (25) 
The second equation can be formally "integrated" by iteration, 

T 

|Q(T)> = £ ((1 - V) ■ W d f |P(T - T')) (26) 

T' = l 

where the initial state vector |S(0)) has been assumed to be uniform on each class, so that |Q(0)) = 0. Finally, 

T 

|P(T + 1)) = V ■ W d ■ ((1 - V) ■ W d ) T ' |P(T - T')) (27) 

T'=0 

Equation (|27|l expresses that, once coarse-gained by the action of the projection operation, the dynamics is not 
Markovian any longer. The principle of our approximation is precisely to omit all memory effects, by neglecting non 
Markovian terms i.e. T' > 1 contributions in (|27|l . and averaging over disorder at each time step, 

\P(T + 1)) ~ [V ■ W d ] |P(T)) (28) 
Obviously, the quality of the approximation depends on the observable 1Z. We shall see two examples in what follows. 

B. Transition matrix for the number of unsatisfied clauses 

A natural choice we study in this and the next two subsections for the observable 1Z is A^O: which measures the 
numbers M$ of clauses unsatisfied by each configuration of the variables. Defining the bra 

(M \= J2 < S I - ( 29 ) 
sen(Mo) 

and the probability Prob[Afo,T] = (Mq\P(T)) that the configuration of the variables is in class Mq at time T, we 
obtain within the Markovian approximation l|28l) . 

Prob[M^T + l]=^ J 4 M , Mo Prob[M ,T] , (30) 

M 

with A M > M a = N M ^ Mo I D Mo and 

N 

N K M = E E 6 ( M ° - M °W) S ( M o - M (&)) Pi(S) 

3=1 S 

D M = E 5 ( M o-M (S)) (31) 
s 

where Pj(S) is the probability of flipping spin k when the system is in configuration S i.e. the number of unsat 
clauses in which spin j appears divided by the number of unsat clauses. S J denotes the configuration obtained from 
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S by flipping spin j. The meaning of our Markovian approximation is clear: the transition rate from one value of 
the observable M to another is the average of the microscopic transition rates from one microscopic configuration 
belonging to the first subset f2(M ) to another belonging to the second one, with a flat average on the starting subset. 
At time T, the only available information in the projected process is that the system is somewhere in the subset, and 
none of the corresponding microscopic configurations can be privileged. 

To perform the average over the disorder, i.e. on the random distribution of formulas and compute [Am 1 m ] = 
[Nm^Mo/ Dmq], we shall do the further approximation that the numerator and the denominator can be averaged 
separately. This 'annealed' hypothesis can be justified in some cases, see Section llV Gl After some combinatorics, we 
find 



Z U N [Mq\ (M - M Q \ ( K\ M ° Zu /iV x Z " 



. ^KM \Z U J \ Z s J \ NJ \N 

K \M-M*-Z. , R sZ s 

l--r~p ; — ; — 8{M' n - M + Z u - Z s ) . (32) 

(2 K -1)NJ \(2 K -1)NJ v u ' v ' 

Z u is the number of unsatisfied clauses which contains the variable to be flipped. All these clauses will become 
satisfied after the flip. The factor Z u j {KMq) represents the probability of flip of the variable, the factor TV coming 
from the sum over its index j. Z s is the number of clauses satisfied prior to the flip and violated after. The meaning 
of the Binomial laws is transparent. Assume that the configuration violates Mq clauses. In the absence of further 
information, the variable which is going to flip has probability K/N to be present in a given clause (there are (^) 
possible JV-uplets over TV variables, and Ck~i) which include a given variable). Furthermore, a satisfied clauses 

that contains the flipped variable has a probability 1/(2 A ' — 1) to become unsatisfied later. Z u (respectively Z s ) is 
thus drawn from a binomial law with parameter K/N (resp. K/(N(2 K — 1))) , over Mo (resp. M — Mo) tries. This 
reasoning unveils the physical significance of our Markovian approximation; we neglect all correlations between flipped 
variables and clauses that inevitably arise as the algorithm runs beyond the description in terms of the macroscopic 
variable Mq. 



C. Average evolution and the metastable plateau 

The evolution equation for the average fraction (p${T) — YIm Mq ProbfMo, T]/M of unsatisfied clauses at time 
T = tM is easily computed in the large size limit from l|3UI32|l . In particular, the average fraction of unsat clauses 
equals 

with (p (0) = l/2 K . Two re gimes appear. If the ratio a is smaller than the critical value 

2 K - 1 

MK) - — — , (34) 

the average fraction of unsat clauses ipo vanishes after a finite time t res . Typically, the algorithm will find a solution 
after t res x M steps (linear in N), and then stops. Predictions for ad are in good but not perfect agreement with 
estimates from numerical simulations e.g. ad — 7/3 vs. ad — 2.7 — 2.8 for 3-SAT. The average resolution time 
t res (a, K) predicted within this approximation is given by the time at ipo(t) in eqn l|33|l vanishes. It logarithmically 
diverges as a reaches the dynamical threshold at fixed K, 

t res (a,K) ~__L]n(Qa(jr)-a) , a -> a d {K)~ . (35) 
On the contrary, when a > ad(K), ipo converges to a finite and positive value 

atojo-^i-s^o) (36) 

when t — ► oo (Figure EJl . RandomWalkS AT is not able to find a solution and gets trapped at a positive level of 
unsatisfied clauses. This situation arises in the limit T oc N,N — > oo, and corresponds to the metastable plateau 
identified in Section III CI 
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D. Large deviations and escape from the metastable plateau 

As explained above, when a > ad, the system gets almost surely trapped in a metastable portion of the configuration 
space with a non zero number of unsatisfied clauses. Numerical experiments indicate the existence of an exponentially 
small-in-iV probability ~ exp(N £(a)) with C < that this scenario is not correct, and that a solution is indeed found 
in a linear time. We now make use of our Markovian hypothesis to derive an approximate expression for £. 

Contrary to the previous section, we now consider the large deviation of the process with respect to its typical 
behaviour. This can be accessed through the study of the large deviation function n(tpo,t) of the fraction ipo |38j . 

n((fo,t) = lim — lnProb[M = M <p , T = t M] (37) 

Introduction of the reduced time is a consequence of the following remark. 0(1) changes in the fraction tpo, that is, 
0(N) changes in the number M of unsatisfied clauses are most likely to occur after a number of flips of the order of 
N. To compute the large deviation function tt, we introduce the generating function of Mo, 

G[y, T] = J2 Prob[M , T] exp(y M ) , (38) 

Mo 

where y is a real-valued number. In the thermodynamic limit, G is expected to scale exponentially with TV with a 
rate 

9(y>t)= Jim In G[y,T = tM] = max[w((p 0) t) + ay <p ] , (39) 

equal to the Legendre transform of tt from insertion of definition (|37|l into eqn. (|38|l . Using evolution equations 
(|3t)l32|l . we obtain the following equation for g, 

I*M . _, + - l) + K («-. _ , _ _,L_ ( , . „) *M . m 

along with the initial condition 



g(y,0) =a In — + . (41) 

The average evolution studied in the previous section can be found again from the location of the maximum of tt 
or, equivalently, from the derivative of g in y = 0: ipo(t) = (1/a) dg/dy(0,t). The logarithm of the probability of 
resolution (divided by N) at times large compared to N, but very small compared to expO(iV), is given by 

C(a, K) = vr(^ = 0, i -» oo) - f ) dy z{y, a) , (42) 



where z(y,a) = (y- aK{e y - 1)/(2 A ' - \))/{K(e' y - 1 - (e y - l)/(2 K - 1))) and y{a) is the negative root of z. 

Predictions for £ in the K ~ '3 case are plotted in Figure 03 They are compared to experimental measures of £, 
that is, the logarithm (divided by N) of the average resolution times. It is expected on intuitive grounds exposed in 
Section Hi CI that £ coincides with — £ (Figure (2J. Despite the roughness of our Markovian approximation, theoretical 
predictions arc in qualitative agreement with numerical experiments. 



E. Taking into account clauses types 

The calculation of Section IV. B can be extended to other observables TZ. In the following, we consider the case of 
a vectorial observable Ad, with K + 1 components. Our starting point is the classification of clauses into 'types'. A 
clause is said to be of type i, with i = 0, . . . , K , if the variables of the configuration S satisfy i among K of its literals. 
If i = the clause is unsatisfied while, as soon as i > 1, the clause is satisfied. Let us call Mj(S) the number of clauses 
of type i, and M(S) = (Mo(S), . . . , Mr-(S)) the vector made with these population sizes. Clearly, J2i = M 

for any configuration. If Mq(S) — then S is a solution of the formula. Vector M is a natural characterization of 
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ratio a of clauses per variable 

2 2.4 2.8 3.2 3.6 




-0.06 



FIG. 9: Large deviations for the 3-SAT problem. The logarithm £ of the probability of successful resolution (over the linear in 
N time scale) is plotted as a function of the ratio a of clauses per variables. Predictions for f (a, 3) have been obtained within 
the approximations of Section llV Dl (equation 1421 . dot-dashed curve) and Section llV El (fourth order solution of eqn 1471 . 
solid curve). Diamonds corresponds to (minus) the logarithm ( of the average resolution times (averaged over 2,000 to 10,000 
samples depending on the values of a, N, divided by N and extrapolated to N — > oo) obtained from numerical simulations. 
Error bars are of the order of the size of the diamond symbol. Schoning's bound is f > ln(3/4) ~ —0.288. 



the configuration of variables (and of the instance), and contains essential information about the operation of the 
algorithm. Indeed, the algorithm stops if the number Mq of unsatisfied clauses vanishes. In addition, at each step of 
the algorithm, a single variable is flipped; clauses of type i become of type i ± 1 if they include this variable, remain 
of type i otherwise. 

Within our Markovian annealed approximation, the probability Prob[M,T] that the configuration of the variables 
is in class S1(M) at time T obeys the evolution equation, 



Prob[Af', T + 1] = Y, \ A m>m] Pro MM, T] , 



with 



iv y - — 
[^M'All = zZkW 5 (m' -m-a-z) P(Z\M) 



KM, 



(43) 



(44) 



where Z = (Zq, Z\, Z(, . . . , Zi, Zf, . . . , Zk-i, Zf ( _ 1 , Zk) is a 2K — 1 dimensional vector. Component Zi is the number 
of clauses of type i where the variable which is going to flip appears. In Z? of these Zi clauses, this variable was 
one of the i satisfying literals. It is not necessary to introduce components Zq and Z S K for they have obvious values 
(respectively equal to and Zx). A denotes a (K + 1) x (2K — 1) matrix such that A • Z gives the change in the 
observable M. when the variable is flipped. The iih line of A • Z reads —Zi + Zf +l + (^_i — Zf_ ± ). Clauses that 
contain the flipped variable and were of type i prior to the flip are not any longer of this type after the flip (hence 
the term —Zi), those which were of type i + 1 and satisfied by the variable become of type i (+Zf +1 ), as those which 

were of type i — 1 and unsatisfied by the flipping variable — Zf_ x ). The probability of Z conditioned to M is, 

as in the simpler case of Section IIVBI a product of Binomial laws, 



K 



Mi-Zi K-l 



Repeating the procedures of Sections IIV CI and IIVDI we find 

• The average fraction of unsat clauses is calculated in Appendix^ and reads 



Zi-z; 



(45) 



ip (a,K,t) = + 



1 

aK 



(1 + tanh(erf)) 



K 



(46) 



16 



The critical ratio separating polynomial from exponential resolutions is found at the same value as in Sec- 
tion llV Bl ctd(K) = (2 K — 1)/K. The average fraction of unsatisfied clauses on the plateau (ipo in the t — ► oo 
limit) when a > ctd(K) has also the same expression, cf eqn. H36|) . Note however that the finite time evolution 
differs in the two calculations. 

When a < ad, the resolution time t res is given by the vanishing of ipo in eqn (|46|l . and differs from its value 
found within the simpler approximation of Section IfV Bl 

• The probability of easy (linear time) resolution is accessible from the large deviation function n(tp, t) of the 
fractions <fi of clauses of type < i < K . Its Legendre transform g(y,t) obeys the PDE 

= -.o + m + £ ((* - *)e— + * - K) « (47) 

a at ^—^ oyt 

along with the initial condition 




The logarithm of the probability of resolution (divided by N) at times large compared to N, but very small 
compared to expO(iV), is given by 

C(a,K) = maxg(y ,yi = 2/2 = ■•• = Vk =0,t) . (49) 

Vo 

We have not been able to caculate exactly £ for generic values of a and K, but have resorted to a polynomial 
expansion of g in powers of its arguments yi. The expansion has been done up to order 4 with the help of a 
symbolic computation software for K = - 3, and up to order 2 analytically for any K. Calculations are exposed 
in Appendix iBl Predictions for ((a) in the K = 3 case are plotted in Figure ED 



F. The large K limit 

Comparison between results of Sections IIVBI and IIV El shows that the output of the calculation quantitatively 
depends on the observable under study. However, we may expect some simplification to take place for large K. In 
this limit, if a clause gets unsatisfied twice, or more (but <C K times), it is very unlikely that each variable will be 
flipped more than once, and memory effects are lost. Therefore, the Markovian annealed approximation is expected 
to become correct. However, to avoid a trivial limit, the ratio a of clauses per variable must be rescaled accordingly. 
Inspection of the above result (|34|) indicate that the correct scaling is a, K — > oo at fixed ratio a* = aK/2 K . The 
dynamical threshold separating linear from exponential resolutions is located in 

a* d = l . (50) 



As the critical threshold of K-SAT is known to scale as a c (K) ~ 2 K In 2 for large K [33, |33, instances are always 
satisfiable on the reduced a* scale. 

For a* < 1, the initial fraction of unsatisfied clauses is ~ 1/2 K , and decreases by O(l) per unit of reduced time t, 
giving t res ~ 1/2 . For the same reason, the height ipo, which is reached after a 0(1) relaxation time when a* > 1, 
is of the order of 1/2 K . It is therefore natural to define the rescaled fraction of unsatisfied clauses through 

¥>2(a*,t*)= Urn 2 K ^(a*2 K /K,K,t*/2 K ) , (51) 

from which we obtain the rescaled resolution time t* es {a*) for a* < 1 (vanishing of </?q) and plateau height <p£(a*) 
for a* > 1 (limit value of ipQ at large rescaled times). The two schemes of approximation of Section TlV Bl and IIV El 
both yield 

¥>S(«*,0 = 1 + ^(e _Q * t * -l) (52) 
t* res (a*) = ~ ln(l - a*) = 1 + \a* + 1 (a*) 2 + 0((a*) 3 ) (53) 
?t{a*) = 1 - -L = tt * - 1 + 0((a* - l) 2 ) , (54) 
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Note that the small a* expansion (|53H for the resolution time coincides with the exact expansion obtained from eqn 
(|22H with the above rescaling of a and K. We conjecture that the equality holds for higher orders (> 3) in a*, and 
that the above expressions for ipl(a*,t*) and thus for t* es (a*), <^5( a *) are correct. 

The logarithm of the probability of linear time resolution for a > ad(K) needs to be rescaled too, 

(*(a*)= lim K((a*2 K /K,K) (55) 

K — >oo 

to acquire a well defined limit when K — > oo. The scalar approximation of Section llV Dl gives the asymptotic result, 

(*(a*) = J dy V > = -(a* l) 2 + 0((a* - l) 3 ) , (56) 

where y(a*) is the negative root of the numerator in the above integral. The quadratic resolution of the PDE arising 
from the study of Section llV El Ccf . Appendix |B|) also leads to this result around a* = 1. Unfortunately, the exact 
results obtained in Section ITTll are of no help to confirm identity (|56H . 



G. The XORSAT case 



XORSAT is a version of a satisfiability problem, much simpler than SAT from a computational complexity point 
of view|24L I2H I35L l36l| . One still draws if-uplets of variables, but each clause bears only one 'sign' (instead of one for 
each variable in the KSAT version), and the clause is said to be satisfied if the exclusive OR (XOR) of its boolean 
variables is equal to the 'sign' of the clause. For a given clause, there are 2 K ~ X satisfiable assignments of the variables, 
and also 2 K_1 unsatisfiable assignments, in deep contrast with SAT where these numbers are respectively equal to 
2 K — 1 and 1. XORSAT may be recast as linear algebra problem, where a set of M equations involving TV Boolean 
variables must be satisfied modulo 2, and is therefore solvable in polynomial time by various methods e.g. Gaussian 
elimination. Nevertheless, it is legitimate to ask what the performances of local search methods as RandomWalkSAT 
are for this kind of computational problem. 

A fundamental feature of XORSAT is that, whenever a spin is flipped, all clauses where this spin appears change 
status: the satisfied ones become unsatisfied, and vice versa. There is thus no need to distinguish between clauses 
satisfied by a different number of literals, and the macroscopic observable we track is the number Mo of unsatisfied 
clauses for configuration S as in Section ITVRl It is an easy check that the transition matrix [A] for XORSAT is given 
by eqn (|32[1 where 2 K — 1 is replaced with 1. Main results are: 

• The average fraction of unsatisfied clauses <f o{t) reads 

Ma,K,t) = ± + ^(e- 2aKt -l) , (57) 

and becomes asymptotically strictly positive if the ratio a of clauses per variables exceeds ctd(K) = 1/K, smaller 
than the clustering and critical ratios, a s ~ 0.818 and a c ~ 0.918 for K = 3 respectively [2J, |2f|. The overall 
picture of the algorithm behavior is identical to the SAT case. 

• When a > c<d(K), the average fraction of unsatisfied clauses on the plateau is given by 

= \ (1-^) • (58) 

• The partial differential equation for the generating function is, 

1 d d 

-g- t 9(y,t) =-y + aK( e y - 1) + K(e-y - e y)—g(y,t) (59) 

with g(y,0) = odn[(l + e v )/2]. Resolution in the large time limit is straightforward, with the results shown in 
Figure ITU1 The agreement with numerics is good, especially as K grows. 

The relatively simple structure of XORSAT makes possible the test of some of the approximations done. We show 
in Appendix [C] that the annealed hypothesis done in the calculation of the evolution matrix A is justified in the 
thermodynamic limit. The validity of this approximation in the case of 3-SAT (Section IIV Bp is not established for 
finite K. 
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FIG. 10: Large deviations for the K-XORSAT problem for K = 3 (left) and K = 5 (right). The logarithm ( of the probability 
of successful resolution (over the linear time scale) is plotted as a function of the ratio a of clauses per variables. Diamonds 
corresponds to (minus) the logarithm £ of the average resolution times (averaged over 10,000 samples, divided by N and 
extrapolated to N — > oo) obtained from numerical simulations. Error bars are smaller than the size of the diamond symbol. 



As for X-SAT, quantities of interest have a well defined large a, K limit provided the ratio a* 
is kept fixed; 

1 1 



V* (a*,t*) = Jim jp (a*/K,K,t*) = ^ + i^(e- 2a * t ' -1) 

dy z(y,a*) 



if— >ao 1 2 2a* 



C*(a*) = lim KC,{a*/K,K) 

K ' oo 



= --(a*-l) 2 + 0((a*-l) 3 ) , 



a/ad(K) = a K 

(60) 
(61) 



where z(y, a*) = (y — a*(e v — l))/(e y — e v ) and y(a*) is the negative root of z. 



V. CONCLUSION AND PERSPECTIVES 



In this paper, we have studied the dynamics of resolution of a simple procedure for the satisfaction of Boolean 
constraints, the Random WalkSAT algorithm. We have shown using complementary techniques (expansions and 
approximations) that, for randomly drawn input instances, RandomWalkSAT may have two qualitatively distinct 
behaviours. Instances with small ratios a of clauses per variable are almost surely solved in a time growing linearly 
with their size. On the contrary, for ratios above a threshold ad, the dynamics gets trapped for an exponentially large 
time in a region of the configuration space with a positive fraction of unsatisfied clauses. A solution is finally reached 
through a large fluctuation from this metastable state. 

The freezing taking place at ay does not seem to be related to the onset of clustering between solutions pi). 
Indeed the value of ad is expected to change with the local search rules. It would be interesting to pursue the study 
initiated in the present work to understand if and how the existence of this dynamical threshold is related to some 
property of the (static) energy landscape, as in mean field models of spin glasses [H). Another useful improvement 
would be to go beyond the Markovian approximation of Section IIVI Unfortunately, keeping a finite (with respect 
to N) number of retarded terms in l|27|) should not be sufficient to achieve this goal. Improvements will require to 
take into account an extensive number of terms, or to extend the quantum formalism of Section III to the study of 
the metastable phase. Another possible direction of research would be to use projection operators on observables 
of extensive dimension 16, 39] . Our Markovian approximation, expected to be exact in the large K limit, should 
be a starting point for a systematic large K expansion of the quantities of interest (plateau height, life time of the 
metastable regime, ...). Finally, extension of our analysis to more sophisticated local search heuristics would be useful. 
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APPENDIX A: GENERATING FUNCTION FOR THE AVERAGE EVOLUTION 

Defining (p — ((p , . . . , (fx) = J2m Prob[M, T]M/M and the reduced time t = T/M, one get from eqn (J20J), 

-f = V + OtKW r ■ if , 

at 

with v a K + 1-dimensional vector and W r a (K + 1) x (K + 1) matrix defined as 



(Al) 



1 


V o j 



W r 



( -I Tt 

1 -1 f 

E=l -l A o 

K 1 K u 



V o o oo 



o \ 



-1/ 



(A2) 



To solve equation (|A1|) . it results convenient to introduce $>(x,t), the polynomial in x whose coefficients are the 
fractions ifi we want to determine, 



The set of K + 1 linear coupled differential IjAlfl reduces to a partial differential equation on $(x, t) : 

— (x,t) = -l + x + aK(x - l)$(x,t) + all - x 2 ) — (x,t) 
at ox 



(A3) 



(A4) 



At initial time, the variables are chosen randomly to be true or false, without any correlation with the formula studied. 

i /ir 



Thus, the number of satisfied literals in a clause obeys a binomial law with parameter 1/2: ^-(0) = Tnri 1 ^). The 



initial condition on $ reads thus 



$(x,0) 



l + x 



K 



(A5) 



Setting ^(x, t) — Q(x, t) + l/(aK), the constant terms in eqn (| A4|) can be eliminated, with the resulting PDE for 

(A6) 



— (x, t) - aK(x - l)*(x, t) + a(l - x 2 )-^{x, t) 



This can be transformed into a wave equation on x( x > t) = (1 + x ) ^(x,t) 



dx 
dt 



(x,t) =a(l-x ) — (x,t) 



This equation is solved in terms of an arbitrary function of a single argument, 

x(x, t) = u ( — tanh _1 (x) + t 



(A7) 



(A8) 
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Knowledge of 0) for all x (|A5|I allows us to determine unambiguously ui, 



lu{u) = ^K+^i 1 + tanh(an)] K 
Going backwards, we obtain the expression of the generating function 

K 



1 + X 



1 

~aK 



1 + x tanh(ai) 



1 + tanh(at) 



K 



and of the fractions of clauses of type i through an expansion of the latter in powers of x, 



ipj(t) 



1 



1 



(tanh(ai))-? 



2 K aK (1 + tanh(crf)) 



K 



S j0 

aK 



(A9) 



(A10) 



(All) 



APPENDIX B: PERTURB ATIVE RESOLUTION OF THE LARGE DEVIATION PDE 

In this appendix we sketch the resolution of PDE (|47|l in the long time limit, where the function g becomes 
independent of time. We expand it in powers of its arguments: 



K K 

(y, t -> oo) = ^2 a tVt a i]ViVj + 



(Bl) 



i,j=0 



Plugging this expansion into eqn. (|47|l . one obtains by identification of the monomials in y^ an infinite set of linear 
equations on the coefficients of g, which can be solved order by order. Constraint ip^ = 1 imposes a condition on 
9, 



g(y + cl) = ac + g(y) 



(B2) 



where 1 = (1, 1, . . . , 1) and c is an arbitrary constant. 

In the case K — 3, we have solved the set of equations on the coefficients of g up to order 4 in the yiS with the help 
of a symbolic computation software. To calculate £ wc need to know g as a function of yo only, with g/j = OVi > 1. 
We find 



g{yo) 



3a — 7 



24 



-2/o 



105a - 94 
1920 



-Ik) 



, t 26460a + 10753 3 29645a + 66244 4 



193560 



-Va 



18923520 



-Vo + O(y d ) 



(B3) 



When a > ad(K = 3) = 7/3, this function has a non trivial extremum, in which g takes the value C( a )- 

We now explain the resolution at quadratic order for a generic value of K . At linear order, following the calculation 
exposed in Appendix A, 



a i = alun o < Pi (t) = ±( K . 



$i0 

K 



(B4) 



In particular ao = (a — ad(K))/2 K . Then considering the monomials of second order in the expansion of the equation, 
a set of (K + 1)(K + 2)/2 linear equations determine the coefficients ay. As we shall not try to solve the equation at 
higher orders for this generic case, we need only aoo- Again we introduce a generating function to turn the discrete 
algebraic problem into an analytic one. f(s) = J2i j a ijS l+ '' obeys the ordinary differential equation, 



2Kf(s) - (s + l)f'(s) = (s - 1) 1 - (1 + aK) 



1 + s u 



with the condition /(l) = stemming from l|B2|) . This equation can be easily solved, yielding 



aoo = /(0) 



>2K 



(K - l)2 2K + 2 K + 2K - K2 K+1 
K(2K- 1)(2 K - 1) 



(B5) 



(B6) 



At this order of the expansion, the extremum of g in the subspace t/,- = 0, Vi > 1 is reached in yo — — ao/aoo and leads 
toC(a,^) = -^/(2a o). 
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APPENDIX C: VALIDITY OF THE ANNEALED HYPOTHESIS FOR THE XORSAT PROBLEM 



We justify in this appendix the annealed average of the Markovian transition matrix in the XORSAT case. Our 
analysis is based on Chebyshev inequality jijj: a positive integer valued random variable with a variance negligible 
with respects to the square of its average is sharply peaked around its mean value. Call Du = J2 S S(U — U(S)) 
the number of configurations with U unsatisfied clauses. The first moment of Djj over the distribution of XORSAT 
instances is easy to compute. After averaging, the 2 N configurations of the variables contribute equally to the sum; 
for each of them, the number U of unsatisfied clause has a binomial distribution of parameter 1/2 among the M 
clauses. In the thermodynamic limit, using Stirling's formula and denoting p>Q — U/M, 

[Du]~e Nh ^ ^ , / 1 (^ ,a) = ln2 + a(-^oln^o-(l-Vo)ln(l-^o)-ln2) (CI) 

up to polynomial corrections. Suppose that po < 1/2. Call p>Q (a) the root of fa at fixed a. It is a growing function 
of a, vanishing for a < 1, and monotonously increasing to 1/2 as a gets large. For tpo > <Po fi is positive, and 
[Du] exponentially large. When ip < ip^' (a), [Du] is exponentially small. 

Consider now the second moment [Dfj] and its leading behaviour [D v ] ~ exp[iV/2(<^o> &)]■ We introduce the 
generating function 



U " u "" Si,S 2 



C'r^[ 



D 2 i e -2xu = / ^_ e -*(^(Si)+£/(s 2 ))+ie(a(s 1 )-c/(s 2 )) 



(C2) 



The average on the r.h.s. can be readily performed as the M clauses are drawn independently The trace on the two 
configurations reduces to a sum on the Hamming distance between them. Evaluation of this sum and the integral 
over 9 by the Laplace method yields 

ext [fa(p>a, a) — 2axpo] — S(x, a) (C3) 

<po 

where S(x, a) is the maximum over 7 of 

S(^,x, a) = In 2- 7I117— (1 - 7) ln(l - 7) — ax + a hi [1 + p e (j)(coshx - 1)] (C4) 

and p e (j) = (1 + (1 — 2-f) K )/2 is the probability that a randomly drawn clause satisfies (or violates) two configurations 
at Hamming distance d = 7JV. We are thus left with the problem of determining S(x, a) and of computing its Legendre 
transform with respects to x to obtain fi- As the derivative of p e in 7 = 1/2 vanishes, this point is always an extremum 
of S. Two cases must be distinguished, depending on the value of ipo, which fixes a; |2a.l3a .l41 | : 

If 7 = 1/2 is the global maximum of S, then f2(<fio, a) = 2fi(ipo, a), in other words [D v ] ~ [-Da] 2 - In that case 
it is possible to compute the polynomial corrections by expanding around the saddle point, 



[D 



(1-4^(1-^)) (C5) 



[Du] 2 N K - 2 

• If the global maximum of S is not in 7 = 1/2, fa > fi, thus [D v ] 3> [Du] 2 ■ 

We have computed numerically for K = 3 the function ip (a) such that for tp Q > ip^ (a) , the global maxima of S is 
located in 7 = 1/2. It is a growing function of a, vanishing when a < 0.889 |36| . and growing monotously to 1/2 when 
a diverges. The results are shown in Fig. 1111 fall three curves reaches po = 1/2 when a — * 00 without crossing each 
other). In the course of the algorithm operation, po decreases from its initial value (1/2) down to its plateau value, 
and remains confined to the region in the phase diagram where the second moment method applies: ipo > p>^ > P?^- 
This proves that, within the Markovian approximation, the annealed average is correct: as the denominator of the 
transition matrix is peaked around its mean value, the numerator and denominator can be averaged separately. This 
analysis cannot been done in the case of A"-SAT, for which the second moment fails as soon as a > 0[41|. 
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FIG. 11: Study of the moments of Du for 3-XORSAT. Solid line: Markovian annealed prediction for the asymptotic fraction 
of unsat clauses ipo. Long-dashed line: (p^ . Short-dashed line: (p^ . Symbols: asymptotic fraction of unsat clauses on the 
plateau, obtained through numerical simulations. 
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